Dataset statistics
| Number of variables | 12 |
|---|---|
| Number of observations | 569 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 53.5 KiB |
| Average record size in memory | 96.2 B |
Variable types
| Numeric | 11 |
|---|---|
| Categorical | 1 |
radius_mean is highly correlated with perimeter_mean and 3 other fields | High correlation |
perimeter_mean is highly correlated with radius_mean and 4 other fields | High correlation |
area_mean is highly correlated with radius_mean and 3 other fields | High correlation |
smoothness_mean is highly correlated with compactness_mean and 4 other fields | High correlation |
compactness_mean is highly correlated with perimeter_mean and 4 other fields | High correlation |
concavity_mean is highly correlated with radius_mean and 5 other fields | High correlation |
concave points_mean is highly correlated with radius_mean and 5 other fields | High correlation |
symmetry_mean is highly correlated with smoothness_mean and 1 other fields | High correlation |
fractal_dimension_mean is highly correlated with smoothness_mean | High correlation |
radius_mean is highly correlated with perimeter_mean and 4 other fields | High correlation |
perimeter_mean is highly correlated with radius_mean and 4 other fields | High correlation |
area_mean is highly correlated with radius_mean and 3 other fields | High correlation |
smoothness_mean is highly correlated with compactness_mean and 4 other fields | High correlation |
compactness_mean is highly correlated with radius_mean and 6 other fields | High correlation |
concavity_mean is highly correlated with radius_mean and 6 other fields | High correlation |
concave points_mean is highly correlated with radius_mean and 5 other fields | High correlation |
symmetry_mean is highly correlated with smoothness_mean and 2 other fields | High correlation |
fractal_dimension_mean is highly correlated with smoothness_mean and 1 other fields | High correlation |
radius_mean is highly correlated with perimeter_mean and 2 other fields | High correlation |
perimeter_mean is highly correlated with radius_mean and 2 other fields | High correlation |
area_mean is highly correlated with radius_mean and 2 other fields | High correlation |
compactness_mean is highly correlated with concavity_mean and 1 other fields | High correlation |
concavity_mean is highly correlated with compactness_mean and 1 other fields | High correlation |
concave points_mean is highly correlated with radius_mean and 4 other fields | High correlation |
diagnosis is highly correlated with radius_mean and 6 other fields | High correlation |
radius_mean is highly correlated with diagnosis and 5 other fields | High correlation |
texture_mean is highly correlated with diagnosis | High correlation |
perimeter_mean is highly correlated with diagnosis and 5 other fields | High correlation |
area_mean is highly correlated with diagnosis and 5 other fields | High correlation |
smoothness_mean is highly correlated with compactness_mean and 4 other fields | High correlation |
compactness_mean is highly correlated with diagnosis and 8 other fields | High correlation |
concavity_mean is highly correlated with diagnosis and 8 other fields | High correlation |
concave points_mean is highly correlated with diagnosis and 7 other fields | High correlation |
symmetry_mean is highly correlated with smoothness_mean and 4 other fields | High correlation |
fractal_dimension_mean is highly correlated with smoothness_mean and 3 other fields | High correlation |
id has unique values | Unique |
concavity_mean has 13 (2.3%) zeros | Zeros |
concave points_mean has 13 (2.3%) zeros | Zeros |
Reproduction
| Analysis started | 2022-07-30 19:56:47.247749 |
|---|---|
| Analysis finished | 2022-07-30 19:56:59.800604 |
| Duration | 12.55 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
| Distinct | 569 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 30371831.43 |
| Minimum | 8670 |
|---|---|
| Maximum | 911320502 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.6 KiB |
Quantile statistics
| Minimum | 8670 |
|---|---|
| 5-th percentile | 90267 |
| Q1 | 869218 |
| median | 906024 |
| Q3 | 8813129 |
| 95-th percentile | 90424461.4 |
| Maximum | 911320502 |
| Range | 911311832 |
| Interquartile range (IQR) | 7943911 |
Descriptive statistics
| Standard deviation | 125020585.6 |
|---|---|
| Coefficient of variation (CV) | 4.116333448 |
| Kurtosis | 42.19319416 |
| Mean | 30371831.43 |
| Median Absolute Deviation (MAD) | 44225 |
| Skewness | 6.473751802 |
| Sum | 1.728157208 × 1010 |
| Variance | 1.563014683 × 1016 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 842302 | 1 | 0.2% |
| 90250 | 1 | 0.2% |
| 901315 | 1 | 0.2% |
| 9013579 | 1 | 0.2% |
| 9013594 | 1 | 0.2% |
| 9013838 | 1 | 0.2% |
| 901549 | 1 | 0.2% |
| 901836 | 1 | 0.2% |
| 90251 | 1 | 0.2% |
| 9013005 | 1 | 0.2% |
| Other values (559) | 559 |
| Value | Count | Frequency (%) |
| 8670 | 1 | |
| 8913 | 1 | |
| 8915 | 1 | |
| 9047 | 1 | |
| 85715 | 1 | |
| 86208 | 1 | |
| 86211 | 1 | |
| 86355 | 1 | |
| 86408 | 1 | |
| 86409 | 1 |
| Value | Count | Frequency (%) |
| 911320502 | 1 | |
| 911320501 | 1 | |
| 911296202 | 1 | |
| 911296201 | 1 | |
| 911157302 | 1 | |
| 901034302 | 1 | |
| 901034301 | 1 | |
| 881094802 | 1 | |
| 881046502 | 1 | |
| 871001502 | 1 |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.6 KiB |
| B | |
|---|---|
| M |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 569 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | M |
|---|---|
| 2nd row | M |
| 3rd row | M |
| 4th row | M |
| 5th row | M |
Common Values
| Value | Count | Frequency (%) |
| B | 357 | |
| M | 212 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| b | 357 | |
| m | 212 |
Most occurring characters
| Value | Count | Frequency (%) |
| B | 357 | |
| M | 212 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 569 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| B | 357 | |
| M | 212 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 569 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| B | 357 | |
| M | 212 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 569 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| B | 357 | |
| M | 212 |
| Distinct | 456 |
|---|---|
| Distinct (%) | 80.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 14.12729174 |
| Minimum | 6.981 |
|---|---|
| Maximum | 28.11 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.6 KiB |
Quantile statistics
| Minimum | 6.981 |
|---|---|
| 5-th percentile | 9.5292 |
| Q1 | 11.7 |
| median | 13.37 |
| Q3 | 15.78 |
| 95-th percentile | 20.576 |
| Maximum | 28.11 |
| Range | 21.129 |
| Interquartile range (IQR) | 4.08 |
Descriptive statistics
| Standard deviation | 3.524048826 |
|---|---|
| Coefficient of variation (CV) | 0.2494497099 |
| Kurtosis | 0.8455216229 |
| Mean | 14.12729174 |
| Median Absolute Deviation (MAD) | 1.9 |
| Skewness | 0.9423795717 |
| Sum | 8038.429 |
| Variance | 12.41892013 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 12.34 | 4 | 0.7% |
| 11.71 | 3 | 0.5% |
| 12.46 | 3 | 0.5% |
| 13.05 | 3 | 0.5% |
| 10.26 | 3 | 0.5% |
| 13.85 | 3 | 0.5% |
| 12.77 | 3 | 0.5% |
| 13.17 | 3 | 0.5% |
| 13 | 3 | 0.5% |
| 15.46 | 3 | 0.5% |
| Other values (446) | 538 |
| Value | Count | Frequency (%) |
| 6.981 | 1 | |
| 7.691 | 1 | |
| 7.729 | 1 | |
| 7.76 | 1 | |
| 8.196 | 1 | |
| 8.219 | 1 | |
| 8.571 | 1 | |
| 8.597 | 1 | |
| 8.598 | 1 | |
| 8.618 | 1 |
| Value | Count | Frequency (%) |
| 28.11 | 1 | |
| 27.42 | 1 | |
| 27.22 | 1 | |
| 25.73 | 1 | |
| 25.22 | 1 | |
| 24.63 | 1 | |
| 24.25 | 1 | |
| 23.51 | 1 | |
| 23.29 | 1 | |
| 23.27 | 1 |
| Distinct | 479 |
|---|---|
| Distinct (%) | 84.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 19.28964851 |
| Minimum | 9.71 |
|---|---|
| Maximum | 39.28 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.6 KiB |
Quantile statistics
| Minimum | 9.71 |
|---|---|
| 5-th percentile | 13.088 |
| Q1 | 16.17 |
| median | 18.84 |
| Q3 | 21.8 |
| 95-th percentile | 27.15 |
| Maximum | 39.28 |
| Range | 29.57 |
| Interquartile range (IQR) | 5.63 |
Descriptive statistics
| Standard deviation | 4.301035768 |
|---|---|
| Coefficient of variation (CV) | 0.2229711841 |
| Kurtosis | 0.7583189724 |
| Mean | 19.28964851 |
| Median Absolute Deviation (MAD) | 2.81 |
| Skewness | 0.6504495421 |
| Sum | 10975.81 |
| Variance | 18.49890868 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 20.52 | 3 | 0.5% |
| 16.85 | 3 | 0.5% |
| 16.84 | 3 | 0.5% |
| 19.83 | 3 | 0.5% |
| 14.93 | 3 | 0.5% |
| 17.46 | 3 | 0.5% |
| 18.9 | 3 | 0.5% |
| 15.7 | 3 | 0.5% |
| 18.22 | 3 | 0.5% |
| 20.22 | 2 | 0.4% |
| Other values (469) | 540 |
| Value | Count | Frequency (%) |
| 9.71 | 1 | |
| 10.38 | 1 | |
| 10.72 | 1 | |
| 10.82 | 1 | |
| 10.89 | 1 | |
| 10.91 | 1 | |
| 10.94 | 1 | |
| 11.28 | 1 | |
| 11.79 | 1 | |
| 11.89 | 1 |
| Value | Count | Frequency (%) |
| 39.28 | 1 | |
| 33.81 | 1 | |
| 33.56 | 1 | |
| 32.47 | 1 | |
| 31.12 | 1 | |
| 30.72 | 1 | |
| 30.62 | 1 | |
| 29.97 | 1 | |
| 29.81 | 1 | |
| 29.43 | 1 |
perimeter_mean
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 522 |
|---|---|
| Distinct (%) | 91.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 91.96903339 |
| Minimum | 43.79 |
|---|---|
| Maximum | 188.5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.6 KiB |
Quantile statistics
| Minimum | 43.79 |
|---|---|
| 5-th percentile | 60.496 |
| Q1 | 75.17 |
| median | 86.24 |
| Q3 | 104.1 |
| 95-th percentile | 135.82 |
| Maximum | 188.5 |
| Range | 144.71 |
| Interquartile range (IQR) | 28.93 |
Descriptive statistics
| Standard deviation | 24.29898104 |
|---|---|
| Coefficient of variation (CV) | 0.2642082899 |
| Kurtosis | 0.9722135477 |
| Mean | 91.96903339 |
| Median Absolute Deviation (MAD) | 12.71 |
| Skewness | 0.9906504254 |
| Sum | 52330.38 |
| Variance | 590.4404795 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 82.61 | 3 | 0.5% |
| 87.76 | 3 | 0.5% |
| 134.7 | 3 | 0.5% |
| 93.97 | 2 | 0.4% |
| 82.69 | 2 | 0.4% |
| 120.2 | 2 | 0.4% |
| 107.1 | 2 | 0.4% |
| 79.19 | 2 | 0.4% |
| 114.2 | 2 | 0.4% |
| 58.79 | 2 | 0.4% |
| Other values (512) | 546 |
| Value | Count | Frequency (%) |
| 43.79 | 1 | |
| 47.92 | 1 | |
| 47.98 | 1 | |
| 48.34 | 1 | |
| 51.71 | 1 | |
| 53.27 | 1 | |
| 54.09 | 1 | |
| 54.34 | 1 | |
| 54.42 | 1 | |
| 54.53 | 1 |
| Value | Count | Frequency (%) |
| 188.5 | 1 | |
| 186.9 | 1 | |
| 182.1 | 1 | |
| 174.2 | 1 | |
| 171.5 | 1 | |
| 166.2 | 1 | |
| 165.5 | 1 | |
| 158.9 | 1 | |
| 155.1 | 1 | |
| 153.5 | 1 |
| Distinct | 539 |
|---|---|
| Distinct (%) | 94.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 654.8891037 |
| Minimum | 143.5 |
|---|---|
| Maximum | 2501 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.6 KiB |
Quantile statistics
| Minimum | 143.5 |
|---|---|
| 5-th percentile | 275.78 |
| Q1 | 420.3 |
| median | 551.1 |
| Q3 | 782.7 |
| 95-th percentile | 1309.8 |
| Maximum | 2501 |
| Range | 2357.5 |
| Interquartile range (IQR) | 362.4 |
Descriptive statistics
| Standard deviation | 351.9141292 |
|---|---|
| Coefficient of variation (CV) | 0.5373644594 |
| Kurtosis | 3.652302762 |
| Mean | 654.8891037 |
| Median Absolute Deviation (MAD) | 153.3 |
| Skewness | 1.645732176 |
| Sum | 372631.9 |
| Variance | 123843.5543 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 512.2 | 3 | 0.5% |
| 1075 | 2 | 0.4% |
| 582.7 | 2 | 0.4% |
| 399.8 | 2 | 0.4% |
| 641.2 | 2 | 0.4% |
| 394.1 | 2 | 0.4% |
| 372.7 | 2 | 0.4% |
| 477.3 | 2 | 0.4% |
| 758.6 | 2 | 0.4% |
| 1138 | 2 | 0.4% |
| Other values (529) | 548 |
| Value | Count | Frequency (%) |
| 143.5 | 1 | |
| 170.4 | 1 | |
| 178.8 | 1 | |
| 181 | 1 | |
| 201.9 | 1 | |
| 203.9 | 1 | |
| 221.2 | 1 | |
| 221.3 | 1 | |
| 221.8 | 1 | |
| 224.5 | 1 |
| Value | Count | Frequency (%) |
| 2501 | 1 | |
| 2499 | 1 | |
| 2250 | 1 | |
| 2010 | 1 | |
| 1878 | 1 | |
| 1841 | 1 | |
| 1761 | 1 | |
| 1747 | 1 | |
| 1686 | 1 | |
| 1685 | 1 |
| Distinct | 474 |
|---|---|
| Distinct (%) | 83.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.0963602812 |
| Minimum | 0.05263 |
|---|---|
| Maximum | 0.1634 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.6 KiB |
Quantile statistics
| Minimum | 0.05263 |
|---|---|
| 5-th percentile | 0.075042 |
| Q1 | 0.08637 |
| median | 0.09587 |
| Q3 | 0.1053 |
| 95-th percentile | 0.11878 |
| Maximum | 0.1634 |
| Range | 0.11077 |
| Interquartile range (IQR) | 0.01893 |
Descriptive statistics
| Standard deviation | 0.01406412814 |
|---|---|
| Coefficient of variation (CV) | 0.1459535813 |
| Kurtosis | 0.8559749304 |
| Mean | 0.0963602812 |
| Median Absolute Deviation (MAD) | 0.0095 |
| Skewness | 0.4563237648 |
| Sum | 54.829 |
| Variance | 0.0001977997003 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.1007 | 5 | 0.9% |
| 0.115 | 4 | 0.7% |
| 0.1054 | 4 | 0.7% |
| 0.1075 | 4 | 0.7% |
| 0.1063 | 3 | 0.5% |
| 0.117 | 3 | 0.5% |
| 0.1049 | 3 | 0.5% |
| 0.1044 | 3 | 0.5% |
| 0.1066 | 3 | 0.5% |
| 0.1158 | 3 | 0.5% |
| Other values (464) | 534 |
| Value | Count | Frequency (%) |
| 0.05263 | 1 | |
| 0.06251 | 1 | |
| 0.06429 | 1 | |
| 0.06576 | 1 | |
| 0.06613 | 1 | |
| 0.06828 | 1 | |
| 0.06883 | 1 | |
| 0.06935 | 1 | |
| 0.0695 | 1 | |
| 0.06955 | 1 |
| Value | Count | Frequency (%) |
| 0.1634 | 1 | |
| 0.1447 | 1 | |
| 0.1425 | 1 | |
| 0.1398 | 1 | |
| 0.1371 | 1 | |
| 0.1335 | 1 | |
| 0.1326 | 1 | |
| 0.1323 | 1 | |
| 0.1291 | 1 | |
| 0.1286 | 1 |
compactness_mean
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 537 |
|---|---|
| Distinct (%) | 94.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.1043409842 |
| Minimum | 0.01938 |
|---|---|
| Maximum | 0.3454 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.6 KiB |
Quantile statistics
| Minimum | 0.01938 |
|---|---|
| 5-th percentile | 0.04066 |
| Q1 | 0.06492 |
| median | 0.09263 |
| Q3 | 0.1304 |
| 95-th percentile | 0.2087 |
| Maximum | 0.3454 |
| Range | 0.32602 |
| Interquartile range (IQR) | 0.06548 |
Descriptive statistics
| Standard deviation | 0.05281275793 |
|---|---|
| Coefficient of variation (CV) | 0.5061554512 |
| Kurtosis | 1.650130467 |
| Mean | 0.1043409842 |
| Median Absolute Deviation (MAD) | 0.03263 |
| Skewness | 1.190123031 |
| Sum | 59.37002 |
| Variance | 0.0027891874 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.1147 | 3 | 0.5% |
| 0.1206 | 3 | 0.5% |
| 0.07698 | 2 | 0.4% |
| 0.05743 | 2 | 0.4% |
| 0.03834 | 2 | 0.4% |
| 0.1516 | 2 | 0.4% |
| 0.1117 | 2 | 0.4% |
| 0.1111 | 2 | 0.4% |
| 0.2087 | 2 | 0.4% |
| 0.1047 | 2 | 0.4% |
| Other values (527) | 547 |
| Value | Count | Frequency (%) |
| 0.01938 | 1 | |
| 0.02344 | 1 | |
| 0.0265 | 1 | |
| 0.02675 | 1 | |
| 0.03116 | 1 | |
| 0.03212 | 1 | |
| 0.03393 | 1 | |
| 0.03398 | 1 | |
| 0.03454 | 1 | |
| 0.03515 | 1 |
| Value | Count | Frequency (%) |
| 0.3454 | 1 | |
| 0.3114 | 1 | |
| 0.2867 | 1 | |
| 0.2839 | 1 | |
| 0.2832 | 1 | |
| 0.2776 | 1 | |
| 0.277 | 1 | |
| 0.2768 | 1 | |
| 0.2665 | 1 | |
| 0.2576 | 1 |
concavity_mean
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONZEROS| Distinct | 537 |
|---|---|
| Distinct (%) | 94.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.08879931582 |
| Minimum | 0 |
|---|---|
| Maximum | 0.4268 |
| Zeros | 13 |
| Zeros (%) | 2.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.0049826 |
| Q1 | 0.02956 |
| median | 0.06154 |
| Q3 | 0.1307 |
| 95-th percentile | 0.24302 |
| Maximum | 0.4268 |
| Range | 0.4268 |
| Interquartile range (IQR) | 0.10114 |
Descriptive statistics
| Standard deviation | 0.07971980871 |
|---|---|
| Coefficient of variation (CV) | 0.8977525105 |
| Kurtosis | 1.998637529 |
| Mean | 0.08879931582 |
| Median Absolute Deviation (MAD) | 0.04046 |
| Skewness | 1.401179739 |
| Sum | 50.5268107 |
| Variance | 0.0063552479 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 13 | 2.3% |
| 0.1204 | 3 | 0.5% |
| 0.1115 | 2 | 0.4% |
| 0.03344 | 2 | 0.4% |
| 0.1103 | 2 | 0.4% |
| 0.1085 | 2 | 0.4% |
| 0.101 | 2 | 0.4% |
| 0.01972 | 2 | 0.4% |
| 0.02995 | 2 | 0.4% |
| 0.1007 | 2 | 0.4% |
| Other values (527) | 537 |
| Value | Count | Frequency (%) |
| 0 | 13 | |
| 0.000692 | 1 | 0.2% |
| 0.0009737 | 1 | 0.2% |
| 0.001194 | 1 | 0.2% |
| 0.001461 | 1 | 0.2% |
| 0.001487 | 1 | 0.2% |
| 0.001546 | 1 | 0.2% |
| 0.001595 | 1 | 0.2% |
| 0.001597 | 1 | 0.2% |
| 0.00186 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.4268 | 1 | |
| 0.4264 | 1 | |
| 0.4108 | 1 | |
| 0.3754 | 1 | |
| 0.3635 | 1 | |
| 0.3523 | 1 | |
| 0.3514 | 1 | |
| 0.3368 | 1 | |
| 0.3339 | 1 | |
| 0.3201 | 1 |
concave points_mean
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONZEROS| Distinct | 542 |
|---|---|
| Distinct (%) | 95.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.04891914587 |
| Minimum | 0 |
|---|---|
| Maximum | 0.2012 |
| Zeros | 13 |
| Zeros (%) | 2.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.0056208 |
| Q1 | 0.02031 |
| median | 0.0335 |
| Q3 | 0.074 |
| 95-th percentile | 0.12574 |
| Maximum | 0.2012 |
| Range | 0.2012 |
| Interquartile range (IQR) | 0.05369 |
Descriptive statistics
| Standard deviation | 0.03880284486 |
|---|---|
| Coefficient of variation (CV) | 0.7932036459 |
| Kurtosis | 1.066555703 |
| Mean | 0.04891914587 |
| Median Absolute Deviation (MAD) | 0.02014 |
| Skewness | 1.171180081 |
| Sum | 27.834994 |
| Variance | 0.001505660769 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 13 | 2.3% |
| 0.02864 | 3 | 0.5% |
| 0.1471 | 2 | 0.4% |
| 0.05778 | 2 | 0.4% |
| 0.02272 | 2 | 0.4% |
| 0.02369 | 2 | 0.4% |
| 0.02377 | 2 | 0.4% |
| 0.02594 | 2 | 0.4% |
| 0.05252 | 2 | 0.4% |
| 0.02031 | 2 | 0.4% |
| Other values (532) | 537 |
| Value | Count | Frequency (%) |
| 0 | 13 | |
| 0.001852 | 1 | 0.2% |
| 0.002404 | 1 | 0.2% |
| 0.002924 | 1 | 0.2% |
| 0.002941 | 1 | 0.2% |
| 0.003125 | 1 | 0.2% |
| 0.003261 | 1 | 0.2% |
| 0.003333 | 1 | 0.2% |
| 0.003472 | 1 | 0.2% |
| 0.004167 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.2012 | 1 | |
| 0.1913 | 1 | |
| 0.1878 | 1 | |
| 0.1845 | 1 | |
| 0.1823 | 1 | |
| 0.1689 | 1 | |
| 0.162 | 1 | |
| 0.1604 | 1 | |
| 0.1595 | 1 | |
| 0.1562 | 1 |
| Distinct | 432 |
|---|---|
| Distinct (%) | 75.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.1811618629 |
| Minimum | 0.106 |
|---|---|
| Maximum | 0.304 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.6 KiB |
Quantile statistics
| Minimum | 0.106 |
|---|---|
| 5-th percentile | 0.1415 |
| Q1 | 0.1619 |
| median | 0.1792 |
| Q3 | 0.1957 |
| 95-th percentile | 0.23072 |
| Maximum | 0.304 |
| Range | 0.198 |
| Interquartile range (IQR) | 0.0338 |
Descriptive statistics
| Standard deviation | 0.02741428134 |
|---|---|
| Coefficient of variation (CV) | 0.1513247926 |
| Kurtosis | 1.287932992 |
| Mean | 0.1811618629 |
| Median Absolute Deviation (MAD) | 0.0171 |
| Skewness | 0.7256089734 |
| Sum | 103.0811 |
| Variance | 0.0007515428212 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.1714 | 4 | 0.7% |
| 0.1769 | 4 | 0.7% |
| 0.1893 | 4 | 0.7% |
| 0.1601 | 4 | 0.7% |
| 0.1717 | 4 | 0.7% |
| 0.1861 | 3 | 0.5% |
| 0.1966 | 3 | 0.5% |
| 0.1925 | 3 | 0.5% |
| 0.1506 | 3 | 0.5% |
| 0.1739 | 3 | 0.5% |
| Other values (422) | 534 |
| Value | Count | Frequency (%) |
| 0.106 | 1 | |
| 0.1167 | 1 | |
| 0.1203 | 1 | |
| 0.1215 | 1 | |
| 0.122 | 1 | |
| 0.1274 | 1 | |
| 0.1305 | 1 | |
| 0.1308 | 1 | |
| 0.1337 | 1 | |
| 0.1339 | 1 |
| Value | Count | Frequency (%) |
| 0.304 | 1 | |
| 0.2906 | 1 | |
| 0.2743 | 1 | |
| 0.2678 | 1 | |
| 0.2655 | 1 | |
| 0.2597 | 1 | |
| 0.2595 | 1 | |
| 0.2569 | 1 | |
| 0.2556 | 1 | |
| 0.2548 | 1 |
| Distinct | 499 |
|---|---|
| Distinct (%) | 87.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.06279760984 |
| Minimum | 0.04996 |
|---|---|
| Maximum | 0.09744 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.6 KiB |
Quantile statistics
| Minimum | 0.04996 |
|---|---|
| 5-th percentile | 0.053926 |
| Q1 | 0.0577 |
| median | 0.06154 |
| Q3 | 0.06612 |
| 95-th percentile | 0.07609 |
| Maximum | 0.09744 |
| Range | 0.04748 |
| Interquartile range (IQR) | 0.00842 |
Descriptive statistics
| Standard deviation | 0.007060362795 |
|---|---|
| Coefficient of variation (CV) | 0.1124304382 |
| Kurtosis | 3.00589212 |
| Mean | 0.06279760984 |
| Median Absolute Deviation (MAD) | 0.00422 |
| Skewness | 1.304488813 |
| Sum | 35.73184 |
| Variance | 4.98487228 × 10-5 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.06113 | 3 | 0.5% |
| 0.05913 | 3 | 0.5% |
| 0.05907 | 3 | 0.5% |
| 0.05667 | 3 | 0.5% |
| 0.06782 | 3 | 0.5% |
| 0.05866 | 2 | 0.4% |
| 0.0602 | 2 | 0.4% |
| 0.05674 | 2 | 0.4% |
| 0.06412 | 2 | 0.4% |
| 0.06019 | 2 | 0.4% |
| Other values (489) | 544 |
| Value | Count | Frequency (%) |
| 0.04996 | 1 | |
| 0.05024 | 1 | |
| 0.05025 | 1 | |
| 0.05044 | 1 | |
| 0.05054 | 1 | |
| 0.05096 | 1 | |
| 0.05176 | 1 | |
| 0.05177 | 1 | |
| 0.05185 | 1 | |
| 0.05223 | 1 |
| Value | Count | Frequency (%) |
| 0.09744 | 1 | |
| 0.09575 | 1 | |
| 0.09502 | 1 | |
| 0.09296 | 1 | |
| 0.0898 | 1 | |
| 0.08743 | 1 | |
| 0.0845 | 1 | |
| 0.08261 | 1 | |
| 0.08243 | 1 | |
| 0.08142 | 1 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| id | diagnosis | radius_mean | texture_mean | perimeter_mean | area_mean | smoothness_mean | compactness_mean | concavity_mean | concave points_mean | symmetry_mean | fractal_dimension_mean | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 842302 | M | 17.99 | 10.38 | 122.80 | 1001.0 | 0.11840 | 0.27760 | 0.30010 | 0.14710 | 0.2419 | 0.07871 |
| 1 | 842517 | M | 20.57 | 17.77 | 132.90 | 1326.0 | 0.08474 | 0.07864 | 0.08690 | 0.07017 | 0.1812 | 0.05667 |
| 2 | 84300903 | M | 19.69 | 21.25 | 130.00 | 1203.0 | 0.10960 | 0.15990 | 0.19740 | 0.12790 | 0.2069 | 0.05999 |
| 3 | 84348301 | M | 11.42 | 20.38 | 77.58 | 386.1 | 0.14250 | 0.28390 | 0.24140 | 0.10520 | 0.2597 | 0.09744 |
| 4 | 84358402 | M | 20.29 | 14.34 | 135.10 | 1297.0 | 0.10030 | 0.13280 | 0.19800 | 0.10430 | 0.1809 | 0.05883 |
| 5 | 843786 | M | 12.45 | 15.70 | 82.57 | 477.1 | 0.12780 | 0.17000 | 0.15780 | 0.08089 | 0.2087 | 0.07613 |
| 6 | 844359 | M | 18.25 | 19.98 | 119.60 | 1040.0 | 0.09463 | 0.10900 | 0.11270 | 0.07400 | 0.1794 | 0.05742 |
| 7 | 84458202 | M | 13.71 | 20.83 | 90.20 | 577.9 | 0.11890 | 0.16450 | 0.09366 | 0.05985 | 0.2196 | 0.07451 |
| 8 | 844981 | M | 13.00 | 21.82 | 87.50 | 519.8 | 0.12730 | 0.19320 | 0.18590 | 0.09353 | 0.2350 | 0.07389 |
| 9 | 84501001 | M | 12.46 | 24.04 | 83.97 | 475.9 | 0.11860 | 0.23960 | 0.22730 | 0.08543 | 0.2030 | 0.08243 |
Last rows
| id | diagnosis | radius_mean | texture_mean | perimeter_mean | area_mean | smoothness_mean | compactness_mean | concavity_mean | concave points_mean | symmetry_mean | fractal_dimension_mean | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 559 | 925291 | B | 11.51 | 23.93 | 74.52 | 403.5 | 0.09261 | 0.10210 | 0.11120 | 0.04105 | 0.1388 | 0.06570 |
| 560 | 925292 | B | 14.05 | 27.15 | 91.38 | 600.4 | 0.09929 | 0.11260 | 0.04462 | 0.04304 | 0.1537 | 0.06171 |
| 561 | 925311 | B | 11.20 | 29.37 | 70.67 | 386.0 | 0.07449 | 0.03558 | 0.00000 | 0.00000 | 0.1060 | 0.05502 |
| 562 | 925622 | M | 15.22 | 30.62 | 103.40 | 716.9 | 0.10480 | 0.20870 | 0.25500 | 0.09429 | 0.2128 | 0.07152 |
| 563 | 926125 | M | 20.92 | 25.09 | 143.00 | 1347.0 | 0.10990 | 0.22360 | 0.31740 | 0.14740 | 0.2149 | 0.06879 |
| 564 | 926424 | M | 21.56 | 22.39 | 142.00 | 1479.0 | 0.11100 | 0.11590 | 0.24390 | 0.13890 | 0.1726 | 0.05623 |
| 565 | 926682 | M | 20.13 | 28.25 | 131.20 | 1261.0 | 0.09780 | 0.10340 | 0.14400 | 0.09791 | 0.1752 | 0.05533 |
| 566 | 926954 | M | 16.60 | 28.08 | 108.30 | 858.1 | 0.08455 | 0.10230 | 0.09251 | 0.05302 | 0.1590 | 0.05648 |
| 567 | 927241 | M | 20.60 | 29.33 | 140.10 | 1265.0 | 0.11780 | 0.27700 | 0.35140 | 0.15200 | 0.2397 | 0.07016 |
| 568 | 92751 | B | 7.76 | 24.54 | 47.92 | 181.0 | 0.05263 | 0.04362 | 0.00000 | 0.00000 | 0.1587 | 0.05884 |